sw 2
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
On the Role of Server Momentum in Federated Learning
Sun, Jianhui, Wu, Xidong, Huang, Heng, Zhang, Aidong
Federated Averaging (FedAvg) is known to experience convergence issues when encountering significant clients system heterogeneity and data heterogeneity. Server momentum has been proposed as an effective mitigation. However, existing server momentum works are restrictive in the momentum formulation, do not properly schedule hyperparameters and focus only on system homogeneous settings, which leaves the role of server momentum still an under-explored problem. In this paper, we propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL), (b) enables a popular stagewise hyperparameter scheduler, (c) allows heterogeneous and asynchronous local computing. We provide rigorous convergence analysis for the proposed framework. To our best knowledge, this is the first work that thoroughly analyzes the performances of server momentum with a hyperparameter scheduler and system heterogeneity. Extensive experiments validate the effectiveness of our proposed framework.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Law (0.45)
- Information Technology > Security & Privacy (0.45)
Approximation properties of slice-matching operators
Li, Shiying, Moosmueller, Caroline
Iterative slice-matching procedures are efficient schemes for transferring a source measure to a target measure, especially in high dimensions. These schemes have been successfully used in applications such as color transfer and shape retrieval, and are guaranteed to converge under regularity assumptions. In this paper, we explore approximation properties related to a single step of such iterative schemes by examining an associated slice-matching operator, depending on a source measure, a target measure, and slicing directions. In particular, we demonstrate an invariance property with respect to the source measure, an equivariance property with respect to the target measure, and Lipschitz continuity concerning the slicing directions. We furthermore establish error bounds corresponding to approximating the target measure by one step of the slice-matching scheme and characterize situations in which the slice-matching operator recovers the optimal transport map between two measures. We also investigate connections to affine registration problems with respect to (sliced) Wasserstein distances. These connections can be also be viewed as extensions to the invariance and equivariance properties of the slice-matching operator and illustrate the extent to which slice-matching schemes incorporate affine effects.
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
Measure transfer via stochastic slicing and matching
Li, Shiying, Moosmueller, Caroline
This paper studies iterative schemes for measure transfer and approximation problems, which are defined through a slicing-and-matching procedure. Similar to the sliced Wasserstein distance, these schemes benefit from the availability of closed-form solutions for the one-dimensional optimal transport problem and the associated computational advantages. While such schemes have already been successfully utilized in data science applications, not too many results on their convergence are available. The main contribution of this paper is an almost sure convergence proof for stochastic slicing-and-matching schemes. The proof builds on an interpretation as a stochastic gradient descent scheme on the Wasserstein space. Numerical examples on step-wise image morphing are demonstrated as well.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
A Simple Way to Learn Metrics Between Attributed Graphs
Kaloga, Yacouba, Borgnat, Pierre, Habrard, Amaury
The choice of good distances and similarity measures between objects is important for many machine learning methods. Therefore, many metric learning algorithms have been developed in recent years, mainly for Euclidean data, in order to improve performance of classification or clustering methods. However, due to difficulties in establishing computable, efficient and differentiable distances between attributed graphs, few metric learning algorithms adapted to graphs have been developed despite the strong interest of the community. In this paper, we address this issue by proposing a new Simple Graph Metric Learning - SGML - model with few trainable parameters based on Simple Graph Convolutional Neural Networks - SGCN - and elements of Optimal Transport theory. This model allows us to build an appropriate distance from a database of labeled (attributed) graphs to improve the performance of simple classification algorithms such as k-NN. This distance can be quickly trained while maintaining good performance as illustrated by the experimental studies presented in this paper.
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.34)
Wrapping up my journey scaling Software 2.0 development for AV
I spent the past 5 years of my life dedicated to figuring out how to build the right software infrastructure to enable Software 2.0 development in complex applications like Autonomous Vehicles. It was an incredibly challenging, yet rewarding and humbling journey, and I thank all my amazing colleagues at NVIDIA for it. While I'm exploring what's next, and digesting this amazing journey, I wanted to share some of my learnings, and views on this, so it can maybe help some of you. First things first, if you're not familiar with what Software 2.0 is, please check out Andrej's great original write up on the topic. I introduced a similar framing then, which I used to guide our development efforts, and I will recap it here concisely.
Distribution Regression with Sliced Wasserstein Kernels
Meunier, Dimitri, Pontil, Massimiliano, Ciliberto, Carlo
The problem of learning functions over spaces of probabilities - or distribution regression - is gaining significant interest in the machine learning community. A key challenge behind this problem is to identify a suitable representation capturing all relevant properties of the underlying functional mapping. A principled approach to distribution regression is provided by kernel mean embeddings, which lifts kernel-induced similarity on the input domain at the probability level. This strategy effectively tackles the two-stage sampling nature of the problem, enabling one to derive estimators with strong statistical guarantees, such as universal consistency and excess risk bounds. However, kernel mean embeddings implicitly hinge on the maximum mean discrepancy (MMD), a metric on probabilities, which may fail to capture key geometrical relations between distributions. In contrast, optimal transport (OT) metrics, are potentially more appealing, as documented by the recent literature on the topic. In this work, we propose the first OT-based estimator for distribution regression. We build on the Sliced Wasserstein distance to obtain an OT-based representation. We study the theoretical properties of a kernel ridge regression estimator based on such representation, for which we prove universal consistency and excess risk bounds. Preliminary experiments complement our theoretical findings by showing the effectiveness of the proposed approach and compare it with MMD-based estimators.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Italy (0.04)
Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections
Nadjahi, Kimia, Durmus, Alain, Jacob, Pierre E., Badeau, Roland, Şimşekli, Umut
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
A New Theory for Sketching in Linear Regression
Large datasets create opportunities as well as analytic challenges. A recent development is to use random projection or sketching methods for dimension reduction in statistics and machine learning. In this work, we study the statistical performance of sketching algorithms for linear regression. Suppose we randomly project the data matrix and the outcome using a random sketching matrix reducing the sample size, and do linear regression on the resulting data. How much do we lose compared to the original linear regression? The existing theory does not give a precise enough answer, and this has been a bottleneck for using random projections in practice. In this paper, we introduce a new mathematical approach to the problem, relying on very recent results from asymptotic random matrix theory and free probability theory. This is a perfect fit, as the sketching matrices are random in practice. We allow the dimension and sample sizes to have an arbitrary ratio. We study the most popular sketching methods in a unified framework, including random projection methods (Gaussian and iid projections, uniform orthogonal projections, subsampled randomized Hadamard transforms), as well as sampling methods (including uniform, leverage-based, and greedy sampling). We find precise and simple expressions for the accuracy loss of these methods. These go beyond classical Johnson-Lindenstrauss type results, because they are exact, instead of being bounds up to constants. Our theoretical formulas are surprisingly accurate in extensive simulations and on two empirical datasets.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > East Asia (0.04)